AITopics

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Quamar, Md Muzakkir, Nasir, Ali, ELFerik, Sami

A Novel MDP Decomposition Framework for Scalable UAV Mission Planning in Complex and Uncertain Environments

arXiv.org Artificial IntelligenceDec-2-2025

This paper presents a scalable and fault-tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large-scale Markov Decision Processes (MDPs) by introducing a two-stage decomposition strategy. In the first stage, a factor-based algorithm partitions the global MDP into smaller, goal-specific sub-MDPs by leveraging domain-specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority-based recombination algorithm solves each sub-MDP independently and integrates the results into a unified global policy using a meta-policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision-making in artificial intelligence, enabling real-time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders-of-magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision-making in real-time UAV mission execution.

artificial intelligence, machine learning, optimization problem, (19 more...)

2512.00838

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation (1.00)
Energy (1.00)
Aerospace & Defense (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsNov-20-2025, 22:42:18 GMT

Representation Balancing MDPs for Off-policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.

artificial intelligence, machine learning, representation balancing mdp, (4 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Neural Information Processing SystemsNov-20-2025, 18:33:59 GMT

Representation Balancing MDPs for Off-policy Policy Evaluation

Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A. Faisal, Finale Doshi-Velez, Emma Brunskill

We study the problem of off-policy policy evaluation (OPPE) in RL.

artificial intelligence, evaluation policy, machine learning, (17 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.50)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Rozada, Sergio, Orejuela, Jose Luis, Marques, Antonio G.

Solving Finite-Horizon MDPs via Low-Rank Tensors

arXiv.org Artificial IntelligenceJan-17-2025

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with block-coordinate descent (BCD) and block-coordinate gradient descent (BCGD) algorithms, both with theoretical convergence guarantees. For scenarios where the system dynamics are unknown, we adapt the proposed BCGD method to estimate the VFs using sampled trajectories. Numerical experiments further demonstrate that the proposed framework reduces computational demands in controlled synthetic scenarios and more realistic resource allocation problems.

machine learning, reinforcement learning, tensor, (19 more...)

2501.10598

Country:

Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
North America > United States (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.64)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsOct-8-2024, 18:11:09 GMT

Representation Balancing MDPs for Off-policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.

mdp model, off-policy policy evaluation, representation balancing mdp

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

arXiv.org Artificial IntelligenceFeb-3-2024

Optimized Task Assignment and Predictive Maintenance for Industrial Machines using Markov Decision Process

Nasir, Ali, Mekid, Samir, Sawlan, Zaid, Alsawafy, Omar

The importance of predictive maintenance is well-recognized in the industrial sector for several reasons, e.g., it allows for the reduction in machine downtime, it helps in reducing the production cost, and it is useful in enhancing the life of machines. Consequently, predictive maintenance is one of the key areas of research among the scientific community. Initially, the predictive maintenance used to be time-based but later on (with the advances in sensing technology), condition-based maintenance (CBM) gained more popularity. Maintenance of machine tools involve two key stages, i.e., diagnosis and prognosis. Prognosis deals with the prediction of remaining useful life (RUL) of the machine whereas diagnosis is concerned with detection and identification of various faults in the machine. Major approaches for prognosis include data-based approaches, knowledge-based approaches, and physics (model) based approaches. Diagnosis on the other hand is based on centralized or distributed approaches [1]. Key challenges in predictive maintenance include 1) Dealing with the noisy sensor data, 2) Uncertainty in the operating conditions, and 3) Diversity of tasks assigned to the machine. A comparison between time-based and condition-based maintenance strategies has been presented in [2].

decision epoch, maintenance, probability, (13 more...)

2402.00042

Country:

North America > United States (0.28)
Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Consumer Health (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Chen, Shenghui, Boggess, Kayla, Feng, Lu

Towards Transparent Robotic Planning via Contrastive Explanations

arXiv.org Artificial IntelligenceMar-16-2020

Providing explanations of chosen robotic actions can help to increase the transparency of robotic planning and improve users' trust. Social sciences suggest that the best explanations are contrastive, explaining not just why one action is taken, but why one action is taken instead of another. We formalize the notion of contrastive explanations for robotic planning policies based on Markov decision processes, drawing on insights from the social sciences. We present methods for the automated generation of contrastive explanations with three key factors: selectiveness, constrictiveness, and responsibility. The results of a user study with 100 participants on the Amazon Mechanical Turk platform show that our generated contrastive explanations can help to increase users' understanding and trust of robotic planning policies while reducing users' cognitive burden.

contrastive explanation, explanation, selectiveness, (13 more...)

2003.07425

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.04)

Genre:

Questionnaire & Opinion Survey (0.89)
Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.47)

Neural Information Processing SystemsFeb-14-2020, 11:11:51 GMT

Representation Balancing MDPs for Off-policy Policy Evaluation

Liu, Yao, Gottesman, Omer, Raghu, Aniruddh, Komorowski, Matthieu, Faisal, Aldo A., Doshi-Velez, Finale, Brunskill, Emma

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain. Papers published at the Neural Information Processing Systems Conference.

mdp model, off-policy policy evaluation, representation balancing mdp

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

arXiv.org Artificial IntelligenceDec-10-2019

Before we can find a model, we must forget about perfection

Dobrev, Dimiter

With Reinforcement Learning we assume that a model of the world does exist. We assume furthermore that the model in question is perfect (i.e. it describes the world completely and unambiguously). This article will demonstrate that it does not make sense to search for the perfect model because this model is too complicated and practically impossible to find. We will show that we should abandon the pursuit of perfection and pursue Event-Driven (ED) models instead. These models are generalization of Markov Decision Process (MDP) models. This generalization is essential because nothing can be found without it. Rather than a single MDP, we will aim to find a raft of neat simple ED models each one describing a simple dependency or property. In other words, we will replace the search for a singular and complex perfect model with a search for a large number of simple models.

agent, generator, probability, (17 more...)